System for determination of potential customer status

ABSTRACT

A system is described which accepts corporate and employee data from an interested company and a prospective company, and calculates a probability that the prospective company will form a successful relationship with the interested company and generate campaign data for companies for those prospective companies

This application is a continuation of “A System for Determination ofPotential Customer Status”, Application No. 62/502,706, Filed on May 7,2017 and continuation in part of “A System for Semantic Determination ofJob Titles”, application Ser. No. 15/968,751, filed on May 2, 2018.

FIELD OF THE INVENTION

The present invention generally relates to the analysis of corporatedata to generate campaign data for a company based on whether or not acompany is likely to have a successful relationship with anothercompany.

Marketing systems send out requests to companies to try to formrelationships based on very limited data; this could be based on beingin a certain industry, a personal referral or just a guess.

In the prior art, companies may use rules based on expert analysis. Forinstance, a point-based system which add points if someone opens anemail from the company, or based on headcount ratios such as manager toengineer ratio.

There is a lot of data available for public companies, but no reliableway to determine whether or not a relationship has a chance to succeed.

What is needed is a system for determining the probability of asuccessful relationship between companies, and to quantify thatprobability in such a way that a company can determine how much effortto put into such a relationship.

SUMMARY

A system is described which accepts corporate and employee data from aninterested company and a prospective company, and calculates aprobability that the prospective company will form a successfulrelationship with the interested company.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 shows a workflow of how the system is trained.

FIG. 2 shows a workflow of how the system predicts whether arelationship could occur.

FIG. 3 shows one embodiment of the network used to train and predict therelationship.

FIG. 4 shows one or more embodiment of the invention.

DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS

With regard to this invention, the term “business relationship” is anagreement between two business entities to cooperate with each otherwhere one company would sell products and/or services to the other.

With regard to this invention, the term “interested company” is abusiness entity that is interested in forming a business relationshipwith another company.

With regard to this invention, the term “prospective company” is asecond business entity that the interested company is interested informing a relationship with.

With regard to this invention, the term “structured data” refers toinformation that is characterized based on predefined groups. Each ofthese predefined groups are associated with a string value unique tothat type of structured data called a label. For instance, size of acompany can be characterized as predefined groups of sizes, such as201-500 employees. In one embodiment, the label might be “201-500employees”. In another embodiment, the label might be “mid-sizedcompany”.

With regard to this invention, the term “unstructured data” refers toinformation that is not characterized as an individual value. An exampleof this would be the text in a job description, where the analysisleverages the text and the sequence that the text occurs.

With regard to this invention, the term “one-hot encoded” refers to acoding of structured data as a group of all possible values in apre-determined sequence that are either coded as 1 or 0, such that onlythe one correct value can be a 1. For example, if there were 5 possiblesize groups and the company was associated with the third group, thecoding would be 00100.

With regard to this invention, the term “n-hot encoded” refers to acoding of structured data as a group of all possible values in apredetermined sequence that are coded as either 1 or 0, such that theentity can have zero or more values associated with it. For example, ifyou had to describe a value which can have multiple values, such as aset of technologies that the company uses on its' website. If there werea total of 5 possible technologies and the company used the first andfourth one, the coding would be 10010.

With regard to this invention, the term “employee profile” includes dataabout the employee. The data includes direct information, such as aresume. The data also includes indirect information, such as a jobdescription associated with the employee's role in the company.

In one or more embodiments, the system comprises of a computer, thecomputer containing software configured to accept data about aprospective corporation. The software is also configured to accept aneural network model which can process the data about the relationshipbetween the interested and prospective companies. The coefficientsassociated with the neural network model are configured by processingtraining data to define its parameters. The training data comprises dataabout one or more companies where the relationship between theinterested and one or more prospective companies is known. The neuralnetwork comprises one or more stages, each subsequent stage acceptingdata from a prior stage. In one or more embodiments, the neural networkmodel is configured to be a convolutional neural network (CNN). In otherembodiments, the neural network model is configured as a Long Short TermMemory (LSTM) architecture.

One embodiment of the invention, as shown in FIG. 4, comprises acomputer 402. The computer is configured with a Transfer API module 426,the Transfer API module configured to accept data from at least a UImodule 404, a Data Crawl module 414, a Partner Dump module 418, and aData API module 420. The Transfer API 426 accepts data from the DataCrawl Module 414, Partner Dump Module 418, and Data API module 420 andstores them in a Raw Data Store 422 and a Customer Data Store 412. Inone or more embodiments, the Raw Data Store 422 is configured to acceptraw structured and unstructured data, which is data which comes eitherfrom partners or from web scraping. In one or more embodiments, theCustomer Data Store 412 is configured to accept data directly frominterested companies, including data about prospective companies. In oneor more embodiments, the data in the Customer Data Store 412 ispartitioned, so that only information about the interested company isused to model for that interested company. The computer 402 is alsoconfigured with a training module 406, the training module 406 coupledto the computer and configured to accept data from the Raw Data Store422 and the Customer Data Store 412 and generate prediction modelparameters which are stored in the Prediction Data Store 408. Thecomputer is also configured to store model data in a Model Data Store410. The Model Data Store 410 contains the parameters of the modelassociated with an interested company.

The computer is also configured with a Prediction Module 428, theprediction module configured to accept the prediction model parametersfrom the model data store for an interested company and generating anoutput for each new prospective company in the prediction data store408. The prediction data store 408 is then used by the Campaign APIModule 424 to generate the proper messages to such marketing tools asTwitter, Google Ads, Facebook Ads.

In one or more embodiments, the UI Module 404 is coupled to thecomputer, configured to request and accept data from one or more sourcesusing one or more implementation of a REST API. Information from theseREST API calls are saved to the Customer Data Store 412.

In one or more embodiments, the Partner Dump Module 418 is coupled tothe computer, configured to accept data from various partners in formatsnot limited to but including comma-separated variable (CSV), JavaScriptObject Notation (JSON), or text files. The data is then extracted fromthe files and saved to the Raw Data Store 422.

In one or more embodiments, the Data Crawl Module 414 is coupled to thecomputer, configured to accept html data from one or more specified websites. Information from the html data is then extracted and output tothe Customer Data Store 412.

In one or more embodiments, the Data API Module 420 is configured torequest and accept data from one or more sources via an API. In one ormore embodiments, this includes but is not limited to the ZoomInfo API(http://www.zoominfo.com/business/zoominfo-new-api-documentation as seenon May 6, 2017).

In one or more embodiments, the Semantic Indexing module 416, coupled tothe computer, is configured to store company and people information insuch a way that a request for specific company or people data can beexecuted by searching the embedding data or the vector. The companyinformation is stored as the output of the concatenation step in thetraining module 326. As the embedding is semantic, this type of querywill identify similar companies and people. In one or more embodiments,the Semantic Indexing Module 416 accepts requests to find othercompanies similar to the prospective company being analyzed. In one ormore embodiments, the Semantic Indexing Module 416 accepts requests tocompare two prospective companies to find differences in their indexvalues. For instance, this can be used if two similar companies givevery different results in their predictions of successful relationships.

The Training Module 406, coupled to the computer, is configured toaccept the model data in the Model Data Store associated withprospective companies which already have known relationships with aninterested company. The Training Module 406 generates a neural networkmodel which is initialized as a new model would be initialized, usuallywith randomized weights. Further, the neural network associated with thetraining module 406 includes back propagation of the weights to improvethe model with each input. The model parameters of the model created bythe Training Module 406 is stored in the Model Data Store 410. Thesemodel parameters are used by the Prediction Module 428, which does notchange the weights using back propagation.

In one or more embodiment, the Campaign API 424, coupled to thecomputer, is configured to accept Prediction Data and generate acampaign strategy which outputs via one or more APIs, including but notlimited to Twitter Ads API, Google Ads API, Facebook Ads API, andMarketo API.

FIG. 3 shows how the parameters associated with a company are processed.There are four types of input associated with each company. Industrygroup 302, company size 304, employee profiles 306, technology used bythe company 308, news associated with the company 336, and Intent Data344.

A company is defined as part of this model to be part of a singleindustry group 302, from a pre-defined list of industries. The result isa one-hot encoded value 310. In one or more embodiments, this value ispre-determined by an expert. In other embodiments, a reference guide canbe used. In other embodiments, the Industry group 302 is implicit in themodel based on the output of the concatenate state 326, and is notdirectly input into the model.

A company can be described by its' size 304. In one or more embodiments,a company's size can be based on a measure of the number of employees.In one or more embodiments, a company's size can also be measured byrevenue. Company size is structured data because the data is bucketized;so rather than storing the actual number of employees or dollars ofrevenue, the value is one-hot encoded 312 to be one of severalcategories (i.e., 100-200, 201-500 employees). In one or moreembodiments, the category values are pre-determined based on providedcorporate data.

In one or more embodiments, hiring practices include a measure ofexecutive churn, and number of employees hired over a given period oftime. In one embodiment, the measure is based on executive turnoverratio. In one or more embodiments this ratio can be categorized by rank.For instance 1/1 CEOs replaced, 0/1 CFOs replaced, 2/10 C-suiteexecutives replaced. In other embodiments, skills and title vectorscould be calculated based on recently departed employees above a certainrank. In other embodiments, more weight could be given in thecalculation to higher ranked employees as well as weighing more recentdepartures vs earlier departures.

In one or more embodiments, use of social media is characterized byfrequency of use of one or more corporate accounts. In one or moreembodiments, corporate accounts could be an aggregation of officialcompany accounts and those owned by “C” level managers, such as a CEO orCTO. In one or more embodiments, the data is run through an LSTMprocess, where one word from a post would represent a single data point,where the output of that seeks to understand each post in succession aspart of the total social media stream. In another embodiment, doc2vec ora similar piece of software is used to characterize the social mediaposts.

In one or more embodiments, intent data 344 is “data about businessusers web content consumption” (bombora.com, Apr. 5, 2017). In one ormore embodiments, the intent data 344 is aggregated based on acategorization of the type of interaction with employees of aprospective company. This measure includes, but not limited to, requestsfor literature or watching a webinar. In other embodiments, it ischaracterized by the number and frequency of corporate IP addresseswhich go to a particular website. In one or more embodiments, intentdata is run through an LSTM process. In other embodiments, a skillsvector approach can be used. For example, creating data around eachdifferent event which may happen as intent data, such as open mail,click a link or downloading a marketing asset. Each of these are fedsuccessively through an LSTM process, then the output of that goes intoanother LSTM which understands the successions of actions and summarizesit. Finally the output of that process is fed in as an actions vectorfor concatenation.

In one or more embodiments, the intent data 344 is stored in such a wayto provide a measurement of the relationship between the person orpersons who initiated it and what was initiated, and when. In otherembodiments, the intent data could be stored to provide a measurement ofthe relationship between the particular roles at that company and theintent data.

In one or more embodiments, the interested company provides detailedinformation about its' people through resumes and job descriptions. Thisprofile information about companies 306 can be found through publiclyavailable repositories such as LinkedIn, or by the response of people toemails, applications to attend webinars or conferences, and so forth. Inone or more embodiments, the information about each person is stored insets of words. In other embodiments, the information about each personcan be stored in a term frequency-inverse document frequency (tf-idf)form 318, to reflect how important each word is to a document or acorpus of documents (see https://en.wikipedia.org/wiki/Tf-idf). Theresults are then normalized to unit length and put through anautoencoder 322 prior to adding it into the concatenate layer 326. Inone or more embodiments, this means that the vector is scaled so that ithas an L-2 norm of 1. So that [2,0,0] would become [1,0,0] and [0.1,0.2, 0.3] would become approximately [0.267, 0.534, 0.802]. such thatthe resultant vector has a length of 1.

The titles associated with employees profiles 306 will vary from companyto company, but have some relationship to each other. For instance, a VPof Marketing and Marketing Director would have roughly the same kind ofjob, even though their titles are different. In one or more embodiments,job title data are input through a single layer LSTM model prior toconcatenation. In other embodiments, they are fed through a two-layerLSTM model prior to concatenation.

In one or more embodiments, the skills associated with employee profiles306 are aggregated and stored as n-hot encoded values of skills 314.That is an aggregation of all of the skills of all of the employees. Inone or more embodiments, the analysis of the employee profiles to getthe skills information is done using Doc2Vec, which is “unsupervisedlearning of continuous representations for larger blocks of text, suchas sentences, paragraphs, or entire documents” (seehttps://rare-technologies.com/doc2vec-tutorial/ as seen on Apr. 6,2017).

In one or more embodiments, how a company uses technology 308 ischaracterized by what web technologies they use. In one embodiment, thisinformation is gathered by a service which analyzes websites andautomatically characterizes technologies used. In one or moreembodiments, the technologies used are coded using n-hot encoding 316,each value associated with a different technology like AngularJS,javascript, NodeJS, etc. This is then output as a binary vector of onesand zeros 320, and put through autoencoding 324 prior to concatenatingthis with the other values 326.

In one or more embodiments, data about what technologies are used 308 inthe company, tech installed base, are gathered using a service such asHG Discovery, as described at https://www.hgdata.com/products/ on Apr.5, 2017. This data can be used to find what technologies are used by acompany. This can then be characterized as n-hot encoded values inaddition to the n-hot encoded values for the web technologies 316.

News articles 336 about the company are gathered as well. The newsarticles are characterized by timestamp and a collection of terms. Inone or more embodiment, the news article terms are characterized using atool such as Doc2Vec (seehttps://rare-technologies.com/doc2vec-tutorial/ as seen on Apr. 6, 2017)340. The results of this can either be run through an autoencoder 342 orrun through an LSTM network after which they are joined to the rest ofthe inputs through the concatenate layer 326.

An autoencoder is the process of taking a long vector then used to traina dense layer into something much smaller. The network then has a largernetwork the size of the original vector to predict the original inputfrom this smaller layer (See https://en.wikipedia.org/wiki/Autoencoderas seen on Apr. 17, 2017).

In one or more embodiments, the probability of successful relationshipbetween companies based on the above data can be found using a deeplearning network. A deep learning network is an apparatus which enablesone to find data patterns in data where none was seen.

In one or more embodiments, the concatenate layer 326 is a wide networklayer which accepts all of the inputs from the prior layers associatedwith each input. In one or more embodiments, this is a single CNN layer.

The layers following the concatenate layer are dense layers with pooling328, such that the number of nodes in each layer is may be decreasing,stay the same, or increasing in successive layers. This continues untilall nodes are attached to the final layer. The final layer is a one-widedense layer 330, which goes to an activation layer 332. In one or moreembodiments, the activation layer is a sigmoid, in others a RELU node.The output of the activation layer is a prediction of whether or not arelationship will work as a probability value between 0.0 and 1.0.

FIG. 1 shows the workflow necessary to train the system using theTraining Module 406. The system would accept structured data 102 andunstructured data 104 to characterize both the interested andprospective companies. The system would also accept the knownrelationship data between the interested and prospective companies 106.The data is fed into the model 108 and the model is trained. Theresultant model parameters or weights associated with each node 110 arestored for later use.

FIG. 2 shows the workflow necessary to evaluate the relationship betweenan interested company and prospective company after the model wastrained on companies with known relationships with the interestedcompany using the Prediction Module 428. Structured data is collectedassociated with the prospective company as described previously 202.Unstructured data is collected associated with the prospective companyas described previously 204. The model parameters are set as determinedby the trained model for the interested company in FIG. 1 208. The inputdata is fed into the trained model 206 in the Training Module 406. Theinput data is then processed in the trained model 210, and theprediction as to whether or not a relationship would work is output fromthe model 212.

What is claimed is: 1) A system for generating a marketing campaignbased on relationship data between an interested company and aprospective company, the system comprising: a computer, a UI module,coupled to the computer, configured to accept and request data from oneor more external data sources, a Raw Data Store module, coupled to thecomputer and the UI module, configured to accept structured orunstructured data in various standard data formats such as CommaSeparated Variable and JSON and save the data, a Customer Data Storemodule, coupled to the computer, configured to accept data frominterested companies including data about prospective companies, aTransfer API module, coupled to the computer and the UI module,configured to accept data from the UI module and store the informationin the Raw Data Store module, a training module, coupled to thecomputer, configured to describe a neural network, configured to acceptcorporate data and relationship data, calculate a set of coefficientsfor the neural network from the company and relationship data, and storethe coefficients into the storage module, a Semantic indexing module,coupled to the computer, configured to accept company information fromthe training module, configured to accept requests to compare twoprospective companies, a Prediction module, coupled to the computer,configured to accept prediction parameters from the storage module andgenerate an output for each prospective company in the storage module,and a Campaign API module, coupled to the computer, configured to acceptoutput for each prospective company and outputs to marketing tools suchas Twitter, Google Ads and Facebook Ads. 2) The system in claim 1,further comprising a Data Crawl Module, the Data Crawl Module coupled tothe computer, the Data Crawl Module configured to accept html data fromone or more web sites, to extract information from those web sites andstore the information in the Customer Data Store. 3) The system in claim1, further comprising a Data API Module, the Data API Module coupled tothe computer, the Data API module configured to accept data from one ormore sources via an internet API such as the ZoomInfo API. 4) The systemin claim 1, further comprising a Partner Dump Module, the Partner DumpModule coupled to the computer, the Partner Dump module configured toaccept data from various partners and store said data in the Raw DataStore. 5) A method for generating a marketing campaign between aninterested company and a prospective company, the method using: acomputer, a UI module, coupled to the computer, configured to accept andrequest data from one or more external data sources, a Raw Data Storemodule, coupled to the computer and the UI module, configured to acceptstructured or unstructured data in various standard data formats such asComma Separated Variable and JSON and save the data, a Customer DataStore module, coupled to the computer, configured to accept data frominterested companies including data about prospective companies, aTransfer API module, coupled to the computer and the UI module,configured to accept data from the UI module and store the informationin the Raw Data Store module, a training module, coupled to thecomputer, configured to describe a neural network, configured to acceptcorporate data and relationship data, calculate a set of coefficientsfor the neural network from the company and relationship data, and storethe coefficients into the storage module, a Semantic indexing module,coupled to the computer, configured to accept company information fromthe training module, configured to accept requests to compare twoprospective companies, a Prediction module, coupled to the computer,configured to accept prediction parameters from the storage module andgenerate an output for each prospective company in the storage module,and a Campaign API module, coupled to the computer, configured to acceptoutput for each prospective company and outputs to marketing tools suchas Twitter, Google Ads and Facebook Ads, the method comprising:accepting structured and unstructured data from external data sourcesabout interested and prospective companies, training a neural networkbased on structured and unstructured data using companies with knownrelationships with the interested company and generating sets ofcoefficients, comparing prospective companies with the interestedcompanies based on the neural network coefficients, and generatingcampaign data for those prospective companies.